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(57) Abstract 

An automatic machining or assembly system including a comparator for comparing the intensity of a pixel in a first 
imase of a scene produced by a sensor with a corresponding pixel and pixels increasingly displaced from the correspond- 
ins pixel in a second image of the same scene displaced with respect to said first image and for producing signals represen- 
ting image depth the magnitude of which is determined by the relative displacement of compared pixels having minimum 
intensity^variation or by a second sensor linearly displaced from the first sensor or by optical diffraction means between 
the first sensor and the scene, when rotated to a new position. 
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REAL TIME GENERATICTJ OF STEREO DEPTH MAPS 
The present invention relates to industrial vision systenss and 
in particular to the application of such systenis to the field of 
flexible aut oration of small batch manufacturing and robotic ass^bly 
systems. 

It is a general requirement of such systons that they should be 
able to recognize objects being xienufactured or assentoled, conpare 
this with the desired final configuration of the object and take 
further machining or assa±)ly actions appropriate to achieve that 
final objective* 

Seme known industrial vision systems use a conputer to store 
information relating to the geometry of components to be machined or 
asseirbled in a database. This information is conpared/ during the 
operation of the automatic machine, with the images obtained by a 
camera or other suitable sensor. The sensor measures variation in 
intensity which are then cOTpared with shaded images stored in the 
computer database and inferences made about the type and position of 
the objects in the field of view. 

Tt>e disadvantages of such known systens are that variations in 
anfcient lighting conditions can lead to intensity, and colour 
variations or surface finish variations, reflections and shadows all 
of vAiich coirplicate coirparison with the stored database. Inferring 
object geonetry from an variations in image intensity is, therefore, a 
non- trivial task due to such variations. As an alternative to 
intensity map conparison of objects and their stored image recent 
systenss under development atteirpt to compare range variation of object 
features / hereinafter referred to as 'depth map'f with the stored 
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object parameters. Depth maps may not be affected by arnbient lighting 
variations etc* Ideally/ an industrial sensor to be used in 
conjunction with such, a database would, therefore, provide a depth 
map, independent of spurious light intensity changes, the primary 
sensor information. 

A nunber of such systems are available or being developed based 
on active lighting, scanning ultrasonic or laser rangefinders, stereo 
image analysis or sc»ne ccsrbination of these techniques. It these is a 
requirement of the types of such systems ^ich are to be routinely 
applied in industrial robotic assentolyr that the sensor performance 
must not degrade vrtien analysing "difficult" images such as those 
containing very repetitive features , because the automatic 
manufacturing system will be expected to <^5erate unsupervised on a 
wide variety of tasks. . _ . _ 

Ideally the sensor will be self-contained, such that it is 
possible to mount it on the end of a robot arm so that the viewpoint 
and the scale of the image are under program control. This has the 
additicxial benefit that the dynamic range of such a depth mapping 
sensor need not be as large as with a fixed camera system since the 
area requiring maximum depth resolution eg, the object being 
manufactured will usually be close to. the robot end effector, and even 
wiien this is not the case, the robot arm may be moved to the area of 
interest to "take a closer look". 

Each of the numerous possible af^roaches to obtaining a depth 
map, have advantages in particular applications. . For instance, 
scanning rangefinders can avoid ambient lighting problems but are 
inherently slow and as precision mechanical devices, are also likely 
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to renain expensive. Other structured light approaches can process a 
large nurrber of points in parallel but require setting up for each 
application. 

Stereo image analysis requires that corresponding pixels in the 
two views are identified. Hiis requires an iterative procedure which 
coTplicates iirplonentation in hardware, and can also give ambiguous 
results in the presence of repetitive features in the image/ or if 
features in the:.image/ are nearly aligned with the axis of the stereo 
pair. 

An alternative technique for determining scene depth is "Range 
frcfn Motion". This approach can avoid the correspondence problsn by 
tracing features between adjacent images, but if the full six degrees 
of freedan of motion are allowed, the calculations to recover the 
scene depth can be corplicated, particularly if the relative position 
of each depth can be complicated, particularly if the relative 
position of each view is independent motion in the scene during the 
image sequence capture time. 

If the vision system is to be included in any continuous 
feedback loops .then a further requirement will be that its frame rate 
ie, the rate at which it produces a complete depth map of the object 
mast be fast enough not to limit the dynamic performance of the robot 
- this will require a depth map calculation delay of less than say 0.1 
seconds for a typical modern robot such as the lEM 7565. This will 
almost certainly require a dedicated analysis microprocessor. If this 
is to be conveniently achieved, then the analysis algorithm 
controlling the microprocessor operation irust be as simple as 
possible. 
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An object of the present invention is to provide inciistrial 
vision systan apparatus incorporating a technique for constraining the 
"range from motion" problem in order to sinplify subse^ent analysis 
while still meeting the above requirements for the sensor to be 
self^containedr and able to deal with "difficult" images. 

Another object of the present invention is to provide an 
industrial vision system vdiich does not require any significant manual 
setting-up for a new taskr is insensitive to changes in anbient 
lighting, processes information at a rate matched to the dynamic 
performance of the machining or assembly tool and is not unduly 
expensive. 

It is a further object of the invention to provide an industrial 
vision systQti which recognizes, the position and orientation of an 
object to be machined or assembled, provides dynamic feed back to 
control the machine or assembly tool relative to datums in the visual 
field without the need to provide ccxrplex jigs and fixtures and with 
autc»Tvatic inspection of corponents and assemblies in corparisOT. with 
with a stored specification. 

According to the present invention an autOTiatic machining or 
asseitbly systCTi includes a coiparator for coitparing the intensity of a . 
"pixel" in a first image of . a scene produced by a sensor with a 
corresponding pixel and . pixels increasingly displaced from the 
corresponding pixel on a second image of the same scene displaced with 
respect to said first image and for producing . signals representing 
image depth the magnitude of which- is determined by the relative 
displacenent of cctrpared pixels having miniirum intensity variation. 
The first and second images may be produced by first and 
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second sensors respectively of a plurality of at least three sensors 
linearly displaced from each other or by rotatable optical diffraction 
means between the sensor and the scene, when rotated to first and 
second positions. 

Acaccrding to the invention in another aspect thereof an 
industrial vision system includes either at least three spaced-apart 
sensors each sequentially and synchronously producing a separate one 
of a corresponding at least three images of a scene, or a single 
sensor and means for periodically varying its field of view to produce 
said at least three images frame storage means for storing data 
representing the intensity of a plurality elements of each image as it 
is produced and coiparatory means for sequentially COTiparing the 
intensity of an element of an image produced by one sensor with 
selected elements of an image produced by another sensor and for 
producing a range signal the magnitude of which is determined by the 
relative displacement of the elements of each image giving rise to the 
minimom intensity variation the re-be tween. 

Preferably the system includes a regular geOTtetric array of 
three or more sensors and the conparator sequentially cctrpares 
intensities of pixels in the images produced by the sensors relative 
to a datum pixel in the image from a datum sensor adjusted by amounts 
directly proportional to the position of the sensor in question 
relative to the datum sensor. The use of at least three sensors in 
such an array overcomes the potential ambiguities that a two sensor 
system would give rise to repetitive features in the scene. 

The sensors may be conventional television cameras. 

A passive array of well matched television cameras (or 
alternatively a single camera rapidly scanning through a fixed 
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sequence of positions) can give the advantages of range from motion in 
avoiding any ai±)iguity in pixel correspondence. The virtually 
siiuLiltaneous image acquisition (or short sequence capture time) of 
such cameras minimises the problem of independent . movenent in the 
scene. The fixed set of camera positions siirplifies calibration by 
removing the reliance on a separate motion sensing syston, and careful 
choice of camera positions can then minimise the conplexity of the 
subsequent calculations and eliminate the sensitivity to feature 
orientation. In particular^ the cfegrees of freedom of viewpoint 
position can be reduced from six to two if the cameras are arranged in 
a planar array normal to their collective line of sight. 

In the short term, an array of television cameras v?ould be 
expensive, bulky,, and difficult to calibrate, and an alternative 
technicj:ie .uses the variable parallax offset produced by a rotatable 
block 'of perspex to scan a sequence of viewpoints on to a single 
camera. Two configurations may then be used; a linear, and a circular 
scan se<5aence. These are seen as practical syst^ns in their own 
right, particularly when combined with a high frame rate CCD camera. 

Ttie intensities of pixels in the images produced by the sensors 
may be encoded in binary form and stored sequentially in frame stores 
for later bit-by-bit coirjiarison with the appropriate encoded 
intensities of pixels produced by other sensors. Alternatively the 
intensities of pixels in each image may first be transformed using, a 
Hough transform technique, into a sequence of signals each having one 
of three possible values; a first value ascribed when adjacent pixels 
have equal intensities; a second value ascribed when the intensity of 
a pixel is less than that of the previous pixel in the scan se<^ence; 
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or a third value ascribed v^en the intensity of a pixel is more than 
that of the previous pixel in the scan sequence* The sequence of 
three value signals in one image is then cross correlated with the 
sequence of three value signals in another image and the pixel 
displacefnent giving rise to minimum intensity variation determined 
from the maximum correlation signal output, 

Tl>e invention is perhaps best understood by reference to the 
analogj- of scenes seen by passengers sitting at different windows 
along the length of a train. If the train is moving, objects in the 
scene presented to an observer at one window will be displaced in that 
scene, during any given time interval, by an amount dependent on the 
.range of that object from the window and the observer. Thus objects 
in the foreground such as sleepers in the adjacent track move rapidly 
from one side of the scene to the other whilst objects in the distance 
such as distant mountain ranges hardly move at all or effectively 
remain at the same point in the scene. 

It would indeed be possible as has been mentioned above to 
construct such a "range from motion" imaging system using a single 
sensor to track the motion of image points between a sequence of very 
similar images. However, this approach is limited by the accuracy 
with whiich the motion between the images can be measured, by the speed 
of any independent motion in the scene (vehicles moving relative to 
the train) and by the conplexity of the calculation required to 
reconstruct a depth map given the possible six degrees of freedon of 
the original motion. This calculation can also be ill^onditioned, 
for exanple, if objects are directly in the line of motion (i.e., the 
train driver's view). The principle disadvantage of such a' system 
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would be the unacceptable time it would take to construct a ccnplete 
depth wBp. 

Beturning to the analogy ^ if each of a nuirber of observers on 
the train sitting at different windows sijiultaneously^neasures the 
position of a particular object in the plane of his window, the 
co-ordinates of each object relative to each window frame will be 
displaced by an amount dependent on the window separation and the 
range of that object • If the object is at the same co-ordinate 
position in each of the windows then clearly that object is infinitely 
distant from the observer whilst if the co-ordinates of the position 
of the object vary a great deal frcm the window to window the object 
irust be at a relatively short range. 

lije latter piienomena is utilized in the present invention where 
the prc±>l€n5S in the "range fron motion" analysis are avoided by using 
a camera array or other suitable sensor array to obtain a number of 
views siiailtaneously. The relative position of objects in each of the 
scenes produced by cameras can therefore be established accurately by 
the use of mechanical constraints and the problem of motion within the 
scene is minimal. The accuracy of this technique is maximised if the 
line of sight is normal to the effective motion between views, and its 
sensitivity to directional features in the scene is minimised if the 
cameras are placed in a plane rather than in a line. If a 
displacement of the cameras from each other, i.e., of the view-points, 
is knc%fn then the images may be correlated to reveal the airplitude of 
the parallax motion and hence the range, 

Ctie of the main difficulties with this correlation approach to 
range loeasurenent is its dependence on image features* Were only two 
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sensors to be used it would be clearly be difficult to distinguish the 
relative displaceinent of one object in its image produced by one 
sensor from its image produced in the other sensor from the actual 
displacement -of two similar objects producing similar intensities in 
the images produced by both sensors. The more camera positions there 
are the greater the possibility of removing this ambiguity becones. 
Range estimates are in any case only available vrfiere a significant 
intensity gradient exists in the image otherwise the technique 
produces a sparse depth map* This situation can be iiiproved by 
creating intensity gradients in otherwise uniform surfaces by active 
lighting/ and as the objective is only to provide variable intensity 
profiles this aspect of the systCTi need not be accurately be set-up or 
expensive. 

ESrbodiments of the invention will now be . described with 
reference" to the acccrrpanying drawings of which 

Figure 1 shows a schenatic diagram of a depth-map producing 
system according to the invention. 

Figure 2 is an optical -ray diagram illustrating the relatiCMiship 
between the positon of an image point in a scene, the object point and 
the camera lens_:;in a 3-dijnensional depth-map, 

Figure 3 shows a schematic diagram of an alternative depth-map 
producing systen according to the invention. 

Figure 4 shows part of a depth-map producing system according to 
the invention using a single camera sensor. 

Figure 5 illustrates the effect of camera spacing in a system 
according to the invention. 
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Figure 6 illustrates an 'experimental set up to demonstrate the 
{^ration of the invention and 

Figure 7 - 12 show typical results obtained froro the set-up of 
Figure €• 

In Figure 1 a linear array of television cameras Ig/ lj^r...,l^ 
each with identical sensitivity and each producing equal area images 
^0'^l'*"^n ^ scene including 2 dDjects A and B where A at an 
infinite range R* from the image plane 3. The cameras are equi-spaced 
and separated by a distance L fzcm each other and the object B is at a 
finite range R from the image plane 3. Each camera is associated with 
a frame store 4q, ^i''**''*rx ^^^'^ stores in digital form the 
intensities of each pixel in the images 2q^ 2^/. ••2^ respectively. 

A coiparator 5 is arranged to carpare intensities of certain 
pixels in corresponding lines of the scan images 2^,2-|^/ . • .2^. A lock 
6 and counter 7 control the <^)erations of read-out gates 8^, 

8^,.. .8^ so that intensity values of corresponding pixels, then of 
pixels displaced by one pixel space, then of pixels displaced by two 
pixel spacings etc. in the images produced by adjacent cameras are 
sequentially compared by the cocrparator 5. 

The caiparator 5 is arranged to produce an output signal when 
and only when the variation in pixel intensities carpared is zero or a 
mininum. When this occurs a counter 9, reset to 0 at the start of 
every ccmparason cycle and controlled by the clock 6 is read by a 
range signal generator 10 to produce a signal corresponding to the 
appropriate object range for storage in a depth-map store 11. The 
contents of the depth map store may be corpared in further apparatus 
(not sbcmi) with a stored depth map model of an object to be machined 
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or assCTbled and control signals produced accordingly to c^itrol the 
machining /assenfcly actions of a machine. 

It will be appreciated that if a particular object is at 
infinity, such as object A in Figure 1/its image will occupy the same 
pixel or group of pixels in each of the images 2Q'2y*2^ (see the 
train/passenger- analogy above) . In this case the very first 
conparison of intensities made by the conparator 5 reveals zero or 
minimum intensity variation of the pixels in each of the images stored 
in the frames.; store 4^/4^^... 4^. A range signal is therefore 
generated by the range generator 10 corresponding to infinite range, 
infinite depth. 

In general an object, such as B, will occupy pixels ^g'^'^'^'^n 
in images 2^,2^, ...2^ respectively and the displacenent of these 
pixels relative to the pixels in the adjacent image will depend on the 
inter-camera spacing and the range R of the point B from the image 
plane 3. The cotparator 5 will thus only produce a signal 
corresponding to minimum or zero intensity variation v^en the read-out 
gates 8o'^2'***^n controlled to compare intensity values of pixels 
having the appropriate inter-image displaceinent . In effect the images 
2q,2^,...2^ are sequentially overlayed to an ever increasing extent 
until all the pixels containing the image of the point B are super 
iirposed. At this point conparator 5 produces output signal which 
stops the counter 9 at a count corresponding the a range R as decoded 
by the range generator 10 and the range information is fed to the 
depth-map store 11. 

Hye clock rate of the clock 6 is chosen such that the coirplete 
cycle of pixel conparisons in a line or set of lines of the images 
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^0'^l'*''^n completed in a time sufficiently short to enable the 
coctplete map to be stored in the store 11 in the period following the 
ccnplete frame-scan by each of the cameras 1^,1^^^. ..1^^ 

Hae data resulting frOTi one image capture period can be 
considered as a four dimensional data solid. The conventional stereo 
analysis approach would identify features in each individual camera 
image, then track the motion of these features between images in order 
to establish the magnitude of the parallax offsets, and hence the 
range. However the data can be analysed in an orthogonal direction to 
these camera images for the case of a three dimensional data solid 
(the data structure which would be obtained from a linear array of 
cameras). 

For the three dimensional case , a section through the data solid 
orthogonal to the image plane, results in a new image Giereafter 
referred to as the epipolar or orthogonal image) which consists 
entirely of linear structures. Fran inspection of Fig 2, it may be 
seen by similar triangles that the offset of an image point free the 
centre of an image (Xi) is related to the offset of the object from 
the camera axis (Xp) by the equation :- 

Xi = F.Xp/Z (1) 

Wtere F is the focal length of the camera, and CZ+F) is the 
length of the normal from the object point onto the image plane. The 
slope of the lines in the orthogonal image space is d(Xi)/dCXp) it may 
be seen frOTi (1) above that this slope is inversely proportional to 
the Z co-ordinate of the object point. In a planar array of caneras, 
the four dimensional data solid results in an orthogonal image 
hypersurf ace v*iere the gradient of the hypersurf ace is inversely 
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proportional to range as above. Measurement of the slope of these 
image features will, therefore, provide a cartesian c3epth map. 

These image coordinates xi could be identical in all the sensors 
if correction factors of Xi*F/Z are added to the x calculation for 
each inege. IMder these circumstances the variance between the 
" intensity values of the pixel xi in each of the test images will be 
zero. As P is a constant and Xi is known with some accuracy for each 
' sensor, the variance between the set of image intensity values 
obtained can be plotted as a function of test values of Z, and the 
range estimated by detecting the minimum in the resulting variance 
profile. (Note that the values Xi are calculated relative to cx>e of 
the sensor images for which the correction factors are zero for all Z 
- i.e. for all test values of Z there is at least one of the test 
points at the final intensity, therefore it is only at .one particular 
range where the variance can equal zero.) 

For real images this process can be repeated for each pixel in 
the master image to provide a full depth m^. A cctrputer algorithm 
can readily be devised to iirplOT^ent the process. In order to arrive 
at a data flow description of the chosen algorithm the values of Xi, 
and the test values of Z need to be chosen such that only integer 
frame store addresses are needed, and such that the data flow 
algoritim has a regular structure v*iich is easily mapped into 
hardware. 

Tbe integer addressing requirement can be achieved by choosing 
the values of Xi to be sorve integer multiple of a constant (C-^ce) 
and modifying the test value sequence appropriately. The resulting 
algorittm will then be image position invariant, and is therefore well 
suited to parallel processing by for example an array processor. 
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Figure 3 shows a possible implement at ion of the algorithm as a 
pipeline r c3ata flow process in which the pixel by pixel intensity data 
. streams 13Qr 13^, 132* ..IS^ from each of n cameras (not shown) in an 
array are fed into separate First-In-First-Out (FIFO) buffers stores 
14q/ 14^r 142»..14j^. The delayed outputs corresponding to a first 
range Rq of the buff er stores are added in gates 15 15^^ ISj-.-lS^^ 
and the resultant digital signal representing the corabined intensity 
of those pixels is applied to one: input of a coirparator 16q. The FIFO 
buffer stores 14 and adder circuits 15 associated with a range Rg form 
an Rq unit 17. The unit 17 may, for example add the intensities of 
adjacent pixels frcxn each of the camera images at any given time. 

The data streams 13 are simultaneously fed into units 17^, 17^^/ 
17"^^''', etc. associated with ranges R^, R^/ R^/ etc. Each unit is 
identical r having the same number of FIFO buffers 14 and. adders 15 



bat in each the delay between pixels added is successively increased. 
The outpjt signal from unit 17^ is ccsrpared with the signal fron unit 
17 in the ccxrparator 16q. Hie minimum signal of the two is then 
conpared in coirjarator 16^, with the signal from unit 17'^''" and so on. 
The output frcxu the final comparator circuit 16^ (if m ranges are 
considered) represents the signal from the unit 17 corresponding to 
the ran<5e at which there is minimum variation in the intensity of 
pixels compared- and hence corresponds to a particular range R of that 
point in the scene viewed by a datum camera. 

The hardware required for the analysis can be considerably 
sirrolified by allowing only linear arrays of sensors as the parallax 
offsets being measured can be arranged to be parallel to the camera 
raster scan row direction. A nuitber of these linear elements can be 
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combined to fom a 2D array of sensors this allows the FIFO buffers to 
be replaced by a series of "D type" registers. A further 
siiiplification is possibly if the iitage can be thresholded to reduce 
the nunt)er of bits per pixel as this would sinplify the difference, 
addition, and coitparison c^rations. In the extreme case the 
significant edges can be detected in the images by for instances zero 
crossing analysis, and these binary images cross correlated to detect 
a sparse range map. 

In the case of a linear array of cameras arranged normal to the 
line of sight, the extreme camera separation defines the accuracy 
obtainable as it would for a normal stereo pair. The intermediate 
cameras establish the correspondence of pixels in the two views in the 
presence of repetitive features in the image. 

In order to unairbiguously establish the correspondence of pixels 
in the additional image of the scene with pixels already observed by a 
camera or set of cameras, the actual position of a projection onto the 
new image of a scene point must lie within a tolerance band of the 
position expected fran the image set already available. The tolerance 
band is defined by the wavelength (L) of the highest spatial fre<yiency 
-present in the image. This is produced by the more repetitive feature 
in the scene wien it is at the rainimum range of interest (zmin) . It 
can be shown that the spacing (Bn) of the "n-th" camera from its 
predecessor is given by:- 

Bn =. ( Zmin * L n ) / F ..2 

where F is the focal length of the camera. 

The maximum spatial frequency v^ich can be detected has a 
wavelength of twice the pixel spacing (S). Using this value the 
series of camera spacings becomes 
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Bn = ( Zmin * (2 * S) n ) / F 3 

The equations governing (3epth quantisation of a stereo pair of 
cameras are:- 

Zn= (B*F>/(N*S) ,4 



vAiere N = an integer displacement of the image between the two cameras 
B = baseline of the stereo pair 
S = pixel spacing 
F = Focal length 
Zn = depth 

The separation of the range quantisation levels is given by:- 
( Zn - Zn-1 ) = B* F * t {l-N)/(m) ]/(N*s) 

= Zn (1/(N + 1)).. 5 

If say P% range resolution is required over a range of interval Zmin 
to Zmax then the jnaxiirum allowable separation of quantisation levels 
(MS) is given by:- 

MS = Zmax * P / 100 

= Zn [ 1 / (N + 1) 1 

therefore 

N = ( 100 / p ) - 1 
but 'Zsax = B*F / (N*S) 
therefore 

B*F = [ (100/P>-1]* Zmax * S 6 

This relationship defines the maximum baseline required as a function 
of the camera focal length/ resolution ^ and the requirooents for 
accuracy and range. A possibility opened up by the use of a camera 
array is that the camera spacing can be constant increments , and the 
focal length varied to satisfy equation 6 rather than the constant 
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focal length approach which is a more direct extrapolation of the 
'range from motion' algorithms • This results in a more ccqpact 
sensor, but the different scale of the pixels in each view could 
corplicate the analysis by, for instance requiring a rolling average 
filter of long focal length to maintain the scale, 

A typical requirement in a robotic assembly might be:- 

Zmax = 2.0 meters 

Zmin = 0*5 meters 

P = 5 

Camera Resolution = 512 by 512 pixels. 

In a planar array the separation of the outermost cameras 
defines the accuracy with vAiich the object range can be determined. 
The range quantisation levels Z (n) for this camera pair can be 
obtained by substituting into equation (1) for the maximum .baseline- 
(B) and the image offset (Xi = n.S where S is the pixel spacing, and n 
is a positive integer) to give:- 

ZCn) = (F.B>/(n.S) (n=l,2, ) (7) 

This equation allows calculation of the furthest detectable range by 
evaluation of (7) at n=l. These calculations provide a guide to the 
size of array needed to fulfil a given requirement, but are not a 
theoretical limit to its performance, as it may be possible to locate 
edges to sub pixel resolution in a process similar to hyperacuity in 
human vision. It should be noted that the inverse law governing the 
separation of quantisation levels irrpdies that accurate range profiles 
are available for a very restricted range of depths. In a practical 
syston, if a scanning perspex block 40 is used as shown in Figure 4 in 
conjuncticMi with a single camera 41 this range of fine sensitivity can 
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be adjusted during operaticMi by use of a zoom lenS/ as well as by 
changing the camera position • 

Intennediate cameras in the array maintain the correspondence of 
pixels between any two views. This can be illustrated by considering 
the case of a linear array of cameras and a test object with regular 
vertical lines at a fixed range. This arrangenent would produce the 
orthogOTal image shown in Fig 5. If the cameras spacing is increased 
to an interval such that the image is displaced by one wavelength or 
greater between successive sairples/ then the n^ping . into the 
parameter space will produce ambiguous (or aliased) resxHts as 
illustrated by the set of samples marked # in Fig 5. In general, for 
an extra camera on the end of a line of 'N' cameras the actual 
position of an image feature in the additional image most not deviate 
by more than one wavelength of the highest spacial frequency present 
in the image frcxn its expected position. This observation provides a 
theoretically cptimum number of cameras for any given accuracy and 
resolution based on an exponentially increasing camera spacing. In 
practice this theoretical optimum can only be approximately achieved 
as the use of one vote to distinguish between alternative maxim in 
parameter space makes no allowance for sensor noise and inaccuracy. 

Instead of digitising the intensity values of each pixel for 
comparison as shown in the apparatus of Figures 1 and 3 a Sough 
Transform technique may be used. 

The Hough Transform is a mapping frOTi image space into parameter 
space, which was originally developed to identify the parametric form 
of straight line features in images, and has since been extended to 
analytic curves, arbitrary shapes ^ and can be ^jplied to mlti 
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dimensional data solids described above in order to find the slope of 
features in the orthogonal image set. Returning to Figure 5 it 

will be appreciated that the Hough Transform would accurnulate more 
votes for the correct range line (b) than for the aliased lines (a and 
c). 

It nay be seen from equation (1) that the features in the 
orthogonal image space corresponding to short ranges in the parameter 
space may/ if they are already close to the edge of that solid, exceed 
the boundary of the image solid without appearing in all the images . 
It may also be that they are interrupted by an occluding abject. 
Noise in the image / and alignment or sensor matching probloDs may 
cause tlie vote lines in parameter space to cluster rather than 
intersect at exactly one point for every object point. 

One of the difficulties presented 'by a pipeline architecture 
such as that shown in Figure 3 running at video rate is that of 
screen wrap-around where the pixel for the extreme right of the 
picture is followed through the pipeline by the extrene left pixel 
frcxn the line below. The range estimation algorithm involves 
simultaneous COTparison of pixels at different x offsets in the data 
streams from a number of cameras, and the calculation must therefore 
be inhibited at the end of lines. This can be done in a number of 
ways;- 

1) The calculation of all the intensity variance estimates can 
be terminated once any one of thon reaches the end of the 
line. 

2) The calculation of variances corresponding to the shorter 
ranges require a larger offset than the longer ranges* The 
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field of view given by optical 1) above can therefore be 
extended but with progressively more limited range 
qaantisation by only inhibiting those range calculations 
which have reached the end of line. 
.3) The performance of option 2) can be further extended as each 
range calculation can proceed even if it normally requires 
data frcxii beyond the end of line, by assigning 'wild card' 
values to these data elenients. Iliis gives a full set of 
range estimates over the full field of view, but based on a 
reducing number of cameras/ and so the shorter range 
estimates would have progressively reduced accuracy for 
pixels near the end of the line. 
Tfaese options can be iirplemented by af^nding a control bit to 
the data, streams 13 of Figure 3 in the pipeline so that the 
operaticxial elonents can detect a change of line, with options 2) 
and 3) taking increasingly coniplex action on the speed of the 
calculation hardware, it may also be possible to label pixels with 
additional control bits as they proceed through the calculation so 
that occlusion can be detected by labelling corresponding pixels in 
the data, stream fron each camera. This will label images of objects 
at shorter ranges before those at longer ranges, therefore the 
hardware required for option 3) above could also deal with occlusion 
(see fig 8) but as the accuracy characteristics would then be a 
function of the scene as well as of the image position, an additional 
output indicating the number of cameras on which the pixel 
correspcsidence was based would need to be produced. 
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The accuracy of range estimates from any passive sensor must 
depend on the scene data. It is cxay where the scene data is varying 
that correspcxidence can be established between pixels in different 
views* In the proposed inplementation of Figure 3 if the variance is 
plotted against test range it may be shown that an isolated feature 
would produce a family of variance curves as a function of x as 
illustrated in fig. 5. This can give problens in the range estination 
of isolated features as the minintim obtained is single sided. It also 
gives problens because the variance is low for all range values when 
away frann the isolated feature and can often be lower than the 
variance minirnam shown at the feature. 

A modified algorithm may be devised to subtract the mean 
variance estimate frOTi the variance profile, it could also refoi:m the 
data to give a syimiet r ical minimum by choos ing a symnet r ical 
arrangCTient of cameras with the reference image in the centre. The 
cCTibined results of these operations shows that for an isolated 
feature/ the mininon variance now has a synrnetrical and flat-bottcroed 
profile where the width of the miniitum gives a clear indication of the 
tolerance on the estimated range. 

Tl^e preceding analysis assumes an idealised sensor array. In a 
real system the elements of the sensor array will not be perfectly 
aligned, they will also differ in their overall sensitivity and will 
not exhibit an exactly uniform response over the whole image field. 
Additicxial practical problems are introduced when interfacing the 
sensor array to the analysis hardware as electrical noise can be 
picked up on signal lines, and the quantisation levels in the analogue 
to digital cc»iverters are only accurate to say + or - one least 
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significant bit. The analysis also assumes that the surfaces in the 
scene exhibit perfect LaiiiDertian radiatic^ properties i.e. the 
apparant intensity of radiation from a point on a surface does not 
vary with viewing angle. A corrfcined error itodel can be drawn to 
illustrate the interaction of these error terms. 

The assumption of Lanbertian reflection should be a good 
approximation for most itatt engineering mterials such as those in an 
automated assembly cell provided that the illumination of the cell is 
sufficiently diffuse, particularly as the range of angles tested by a 
practical array would be small (>0.1 radians say). For a more 
reflective surface, any reflection of objects or lights would be 
super iiiposed an the reflectance characteristics of the surface. As 
this is a linear process the siii?>le amplitude correlation described so 
far woald not give a good variance miniiaim at either the range of the 
surface or at this range plus the range from this surface to the 
reflected object. 

The reflected image is added to the normal image of the 
object but for a matt surface the reflected image will not have sharp 
edges. The effect of the reflection on the analysis can therefor be 
reduced by calculating the variance profile of the thresholded first 
difference images rather than pure intensity images. The threshold 
level would be chosen otpirically to suit the polar reflectance 
characteristics of the typical objects encountered. This ^jproach has 
the disadvantage of sharply reducing the number of points in the image 
for which a range estimate can be made, and may therefore require a 
further stage v*iich would use the lower level information to grow 
regions in the depth itep which were ccmsistent with the edge 



BNSDOCID:<WO 880251 8A2> 



wo 88/02518 PCT/GB87/00700 

- 23 - 

information obtained in the first phase. A further difficulty of 
using the first difference image is that if a line of sight is 
tangential to an edge of a curved object then the edge will correspond • 
to different points on the curve from different camera positions. 
This nay cause small errors in the range estimation v*iich cannot be 
ccxnpensated for as might have been possible with the raw data. 

For more reflective surfaces the use of the first difference 
image would allow the variance algorithm to detect the range of 
significant edges in both the object and in the virtual image formed 
by reflection. . This virtual image could be at any apparent range 
depending oa the curvature of the reflecting surface. Further 
analysis based on edge effect, range of interest , consistency with the 
world iQDdel or polorisation of reflected light would then be recjiired 
to eliminate the virtual image range measurCTients. 

Itie-use of the first difference images also condensates for any 
DC bias between different sensors in the array, or for any slow 
variation in sensitivity between different regions of the same sensor. 
Unfortunately it also aggravates any random electrical noise or 
quantisation effects but as these sources of error are usually of 
small amnplitude, the threshold may be set to remove these terms. In 
any event they are uncor related with the image data, the correlation 
of multiple images inherent in the analysis should therefore filter 
out any adverse effects. 

Errors due to inaccurate alignment of the sensors can be 
minimised by the use of solid state cameras vrfiich are rigidly 
connected to each other after alignment calibration. This removes 
drift in the control electronics as a source of alignment error such 
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as would have occurred vrith vidicon tube cameras- Due to the short 
baseline of the array ^ the mechanical constraints can be nuch more 
robust than is possible with a conventional stereo pair^ but 
mechanical shock and thermal expansion of the mountings cannot be 
eliminated. For maximmn accuracy , it will, therefore, be necessary to 
automatically update the calibration offsets using feedback from exact 
object positions as they become known. . For less accurate work, as the 
sensor array is a single unit, calibration by the manufacturers can be 
suffici^t. In either case, manual intervention in setting up for a 
new application is avoided. With this configuration it should be 
possible to calibrate the array such that all the sensors are aligned 
to within plus or minus one pixel error in both the x aiy3 y 
directions. In order to prevent aliasing, the sensors will sairple at 
more than the highest spatial frequency present. The highest possible 
spatial frequency is given" by the point spread "function (PSF) of the 
indiviciial cameras, any misalignment will sanple different points on 
this point spread function. This shows up as an airplitude error 
viiiich in the worst case is given by the alignment error multiplied by 
the maxiimim gradient of the point spread function. This source of 
error is therefore minimised by defocussing the camera array so that 
the maximum spatial frequency of interest is the same wavelength as 
the point spread function. 

The analysis hardware can be substantially simplified by 
reducing the data word length used to represent the image. The use of 
the first difference image will already have reduced the dynamic range 
needed. The minimum variation between pixels is defined by the 
integer resolution of the intensity input, the worst case maxiraim by 
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the product of the pixel separation times the maxiimim gradient of a 
full aiplitude point spread function (PSF), For a wavelength of the 
PSF equal to ten pixels this corresponds to a reduction of two bits in 
dynamic range. Elipirical observation of typical image statistics 
could allow larger reductions. 

If the expected images are high contrast and the image 
statistics are well known then the bit resolution can be further 
reduced down to a binary image ^ or in the case of a first difference 
image a two bit (magnitude and sign) image. This format is chosen in 
preference to the more usual edge detection or zero crossing image as 
these techniques produce a feature which is one pixel wide, any small 
amplitude noise in the original image conbined with a marginal edge 
could easily result in an apparent lateral displacenent of the edge. 
This would prevent the correlation algorithm from tracking the edge. 
A zero-to-one transition defining the edge is less sensitive to this- 
type of small error. 

The alternative formulation of the technique with a scanning 
perspex block avoids many calibration problems. Sensor matching is no 
longer a problem, and the parallelism of the effective viewpoints is 
determined by the parallelism of the sides of the perspex block, which 
can be controlled to close tolerance. The remaining problem is that 
of determining the angular position of the perspex block at each image ' 
position so that the parallax offset can be accurately calculated. 
Such a technique was used in the following experimental arrangement. 

Osing apparatus as shown in Figure 4 a sequence of images may be 
obtained by rotating a parallel sided block of perspex 40 through 
fixed angular incranents in the line of sight of a camera 41. This 
produces an effective offset ( _ x ) given by:- 
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X = TaR^sin ( arcsin[sin{ /R} I) 
cos(arcsin[sin{ /R}]) 

vAiere:- - angular position of the perspex block 
T = thickness of the perspex block 
R = refractive index of the perspex block (1.498) 
. An overhead view of a typical test scene is shown in Fig 6. 
In OTe test carried out by the inventor this scene consisted of a 
machine tool cutter 43 and a spring 42 arranged on the surface marked 
out with a grid for calibration purposes, and in front of a backdrop 
44 tiled with acoustic -tiles. These objects were set at ranges 
between 20 and 100 centimetres. The image was digitised using a 512 
by 512 by 7-bit frame grabber at 32 equal angular increments of the 
perspex block 40. 

An othogonal image from the experimental data set is shewn in 
Figure 7, printed in a pseudo- gray scale on a conventional grag^ics 
printer and clearly shewing the parallax motion to be measured. The 
Hough Transform was calculated for a sanple line (Fig. 8). The 
calculation was repeated for each line in the orthogonal irrege to 
produce sparse depth maps, as shown in 

Figs 10/ 11 and 12. An image of the test scene from one camera alone 
is shc%m in Figure 9 for conparison with the depth nas shown in 
Figures 10/ 11 and 12 • This shows good to sensitivity features vAich 
are normal both to the line of sight of the camera and normal to the 
line of traverse of the linear array. 

A second experiment has been constructed v^ere the rotation 
of the perspex block is co-axial with the line of sight of the camera ^ 
producing a circular offset rather than a linear one* This will give 
equal sensitivity irrespective of feature orientation. 
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The results obtained so far have shown that a planar array image 
sensor can avoid some of the anfciguities inherent in the conventional 
stereo image analysis by the provision of extra information. T^is 
allows an analysis algorithm to be used to derive a depth map v^ich is 
sinple, operating c»i binary data in a single pass. It is, therefore, 
potentially convenient to inplement in hardware. 

The limited depth range over which accurate measurements can be 
achieved with a short baseline sensor can be seen by inspection of the 
range scale in Figure 8 for the dynamic control task, this should not 
prove a problem as fine control is usually only required at close 
range. For the general object recognition task, more time can be 
allowed, therefore the range information can be used in conjunction 
with other image analysis techniques - for instance, to provide scale 
information to constrain the search space of a shape based algorithm. 

A limitation of any passive stereo matching procedure is that 
depth information can only be deduced v*iere there are identifiable 
features in the image. This effect could be avoided if the scene was 
illumirjated with a projected image similar to structured light, to 
produce artificial edges in otherwise featureless areas. As the 
analysis does not use the information of v^at pattern is projected, or 
where fron, the projection system does not have any expensive 
requironoents for accurate equipment or time consuming set-up 
procerus res . 
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CLAIMS 

1 An automatic nachining or assembly system including a comparator 
for conparing the intensity of a pixel in a first image of a scene 
produced by a sensor with a corresponding pixel and pixels increasing 
displaced from the corresponding pixel in a second image of the same 
scene displaced with respect to said first image and for producing 
signals representing ixnage depth the magnitude of which is determined 
by the relative displac^nent of coirpared pixels having minimum 
intensity variation or by a second sensor linearly displaced from the 
first sensor or by optical defr action means between the first sensor 
and the scene v^en rotated to a new position . 

2 An automatic machining or assembly systsn as claimed in Claim 1 
and wherein the first and second images are produced by first and 
second sensors respectively of a plurality of at least three such 
sensors linearly displaced with respect to each other • 

3 An automatic machining . or assembly system as claimed in claim 1 
and wherein the first and second images are produced by rotatable 
optical diffraction means between the sensor and the scene when 
rotated to fist and second position • 

4 An industrial vision system including either at least three 
spaced-apart sensors each sequentially and synchronously producing a 
separate caie of a corresponding plurality of at least three images of 
a scene f or a single sensor and means for periodically varying its 
field of view to produce said at least three images frame storage 
means storing data representing the intensity of a plurality of 
elenents o each image which is produced and cccrparator means for 
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sequentially coctparing the intensity of an element of an image 
produced by one sensor with selected elorents of an inage produced by 
another sensor and for producing a range signal the magnitude of which 
is determined by the relative displacement of the elonents of each 
image giving rise to the minimum intensity variation therebetween. 

5 A system , as claimed in any preceding claim including a regular 
geometric array of three or more sensors and vdaerein the coiparator or 
cortparator means corpares intensities of pixels in the images produced 
by the sensors relative to a datum pixel in theimage f ran a datum 
sensor adjusted by amounts directly proportional to the position of 
the sensor in question relative to the datum sensor. 

6 A system as claimed in any preceding claim and wherein the 
sensors are television cameras. 

7 A system as claimed in any preceding claim and vmereLn the 
intensities of pixels in the images produced by the sensors are 
encoded in binary form and stored sequentially in frame stores for 
later bit-bit-bit ccsnparison with corresponding encoded pixels 
produced by other sensors. 

8 A system as claimed in any preceding claim and v^erein he 
intensities of the pixels in each image are transformed using a Hough 
transform technique into parameter space for processing to find the 
slope of features in an orthogonal image set. • 

9 A system as claimed in any preceding claim and wherein signals 
representing each pixel in each image produced by the sensors are 
applied to a microprocessor with a program algorithm to produce a 
corresponding epipolar or orthogonal image. 
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10 A system as clainied in any preceding claim and v^ierein a linear 
array of sensors is used and the cociparator or coitparison means 
includes a series of D type registers* 

11 A systCT as claimed in claim 9 and wherein said program 
algorithm produces range data from pixel intensity variance with pixel 
offset frosn each sensors line of sight. 
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